Methods and apparatus for data analysis

ABSTRACT

Methods and apparatus for data analysis according to various aspects of the present invention identify statistical outliers in data, such as test data for components. The outliers may be identified and categorized according to the distribution of the data. In addition, outliers may be identified according to multiple parameters, such as spatial relationships, variations in the test data, and correlations to other test data.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application:

is a continuation-in-part of PCT Patent Application Serial No.PCT/US2007/062366, filed on Feb. 17, 2007, entitled “Methods andApparatus for Data Analysis”;

is a continuation-in-part of U.S. patent application Ser. No.11/535,851, filed on Sep. 27, 2006, entitled “Methods and Apparatus forHybrid Outlier Detection”; and

is a continuation-in-part of U.S. patent application Ser. No.10/817,750, filed on Apr. 2, 2004, entitled “Methods and Apparatus forData Analysis”;

and incorporates the disclosure of each application by reference. To theextent that the present disclosure conflicts with any referencedapplication, however, the present disclosure is to be given priority.

BACKGROUND OF THE INVENTION

Semiconductor companies test components to ensure that the componentsoperate properly. Test data may come from a variety of sources, such asparametric electrical testing, optical inspection, scanning electronmicroscopy, energy dispersive x-ray spectroscopy, and focused ion beamprocesses for defect analysis and fault isolation. Testing is typicallyperformed before device packaging (at wafer level) as well as uponcompletion of assembly (final test).

Gathering and analyzing test data is expensive and time consuming.Automatic testers apply signals to the components and read thecorresponding output signals. The output signals may be analyzed todetermine whether the component is operating properly. Each testergenerates a large volume of data. For example, each tester may perform200 tests on a single component, and each of those tests may be repeated10 times. Consequently, a test of a single component may yield 2000results. Because each tester is testing 100 or more components an hourand several testers may be connected to the same server, the testprocess generates an enormous amount of data.

Furthermore, much of the data interpretation is performed manually byengineers who review the data and make deductions about the test andmanufacturing process based on their experience and familiarity with thefabrication and test process. Although manual analysis is ofteneffective, engineers understand the fabrication and test systemsdifferently, and are thus prone to arriving at different subjectiveconclusions based on the same data. Another problem arises whenexperienced personnel leave the company or are otherwise unavailable,for their knowledge and understanding of the fabrication and test systemand the interpretation of the test data cannot be easily transferred toother personnel.

BRIEF DESCRIPTION OF THE DRAWING

A more complete understanding of the present invention may be derived byreferring to the detailed description and the claims when considered inconnection with the following illustrative figures, which may not be toscale. Like reference numbers refer to similar elements throughout thefigures.

FIG. 1 is a block diagram of a test system according to various aspectsof the present invention and associated functional components;

FIG. 2 is a plot of test results including outliers and failures;

FIG. 3 is a diagram of a system for automatically selecting one or moreoutlier identification algorithms;

FIG. 4 is a plot of test results including hidden outliers;

FIG. 5 is a diagram of a wafer and a sample path followed by a testprober over the wafer;

FIG. 6 is a diagram of variations of test data as the prober follows thepath;

FIG. 7 is a flow diagram for identifying hybrid outliers;

FIG. 8 is a diagram of a wafer and a spatial analysis window;

FIG. 9 is a diagram of potential filter types for proximity analysis;

FIG. 10 is a representation of an asymmetrical distribution;

FIGS. 11A-B are correlation charts of test data for two different tests;and

FIG. 12 is a map of test results for a wafer.

Elements in the figures are illustrated for simplicity and clarity andhave not necessarily been drawn to scale. For example, the connectionsand steps performed by some of the elements in the figures may beexaggerated or omitted relative to other elements to help to improveunderstanding of embodiments of the present invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

The present invention may be described in terms of functional blockcomponents and various process steps. Such functional blocks and stepsmay be realized by any number of hardware or software componentsconfigured to perform the specified functions. For example, the presentinvention may employ various testers, processors, storage systems,processes, and algorithms, such as statistical engines, memory elements,signal processing elements, neural networks, pattern analyzers, logicelements, programs, and the like, which may carry out a variety offunctions under the control of one or more testers, microprocessors, orother control devices. In addition, the present invention may bepracticed in conjunction with any number of test environments, and eachsystem described is merely one exemplary application for the invention.Further, the present invention may employ any number of conventionaltechniques for data analysis, component interfacing, data processing,component handling, and the like.

Referring to FIG. 1, a method and apparatus according to various aspectsof the present invention operates in conjunction with a test system 100having a tester 102, such as automatic test equipment (ATE) for testingsemiconductors. In the present embodiment, the test system 100 comprisesa tester 102 and a computer system 108. The test system 100 may beconfigured for testing any components 106, such as semiconductor deviceson a wafer, circuit boards, packaged devices, or other electrical oroptical systems. Various aspects of the present invention, however, maybe applied to many environments having multiple data points, such ascredit card fraud detection, athlete performance analysis, votingirregularity analysis, and severe weather prediction. In the presentembodiment, the components 106 comprise multiple integrated circuit diesformed on a wafer or packaged integrated circuits or devices. Thecomponents 106 are created using a fabrication process, which maycomprise any suitable manufacturing process for creating the components106, and may include a test process, which may comprise any suitableprocess for testing the operation of the components 106.

The tester 102 suitably comprises any test equipment that testscomponents 106 and generates output data relating to the testing, andmay comprise multiple machines or other sources of data. The tester 102may comprise a conventional automatic tester, such as a Teradyne testeror the like, and suitably operates in conjunction with other equipmentfor facilitating the testing. The tester 102 may be selected andconfigured according to the particular components 106 to be testedand/or any other appropriate criteria

The tester 102 may operate in conjunction with the computer system 108to, for example, program the tester 102, load and/or execute the testprogram, collect data, provide instructions to the tester 102, analyzetest data, control tester parameters, and the like. In the presentembodiment, the computer system 108 receives tester data from the tester102 and performs various data analysis functions independently of thetester 102. The computer system 108 may comprise a separate computerhaving a processor 110 and a memory 112, such as a personal computer orworkstation, connected to or networked with the tester 102 to exchangesignals with the tester 102. In an alternative embodiment, the computersystem 108 may be omitted from or integrated into other components ofthe test system 100, and various functions may be performed by othercomponents, such as the tester 102 or elements connected to the network.

The memory 112 suitably stores a component identifier for each component106, such as x-y coordinates corresponding to a position of thecomponent 106 on a wafer map for the tested wafer. Each x-y coordinatein the memory 112 may be associated with a particular component 106 atthe corresponding x-y coordinate on the wafer map. Each componentidentifier has one or more fields, and each field corresponds, forexample, to a particular test performed on the component 106 at thecorresponding x-y position on the wafer, a statistic related to thecorresponding component 106, or other relevant data. The memory 112 maybe configured to include any data identified by the user as desiredaccording to any criteria or rules.

The computer 108 of the present embodiment also suitably has access to astorage system, such as another memory (or a portion of the memory 112),a hard drive array, an optical storage system, or other suitable storagesystem. The storage system may be local, like a hard drive dedicated tothe computer 108 or the tester 102, or may be remote, such as a harddrive array associated with a server to which the test system 100 isconnected. The storage system may store programs and/or data used by thecomputer 108 or other components of the test system 100. In the presentembodiment, the storage system comprises a database 114 available via aremote server 116 comprising, for example, a main production server fora manufacturing facility. The database 114 stores tester information,such as tester data files, master data files for operating the testsystem 100 and its components, test programs, downloadable instructionsfor the test system 100, and the like. In addition, the storage systemmay comprise complete tester data files, such as historical tester datafiles retained for analysis.

The test system 100 may include additional equipment to facilitatetesting of the components 106. For example, the present test system 100includes a device interface 104, like a conventional device interfaceboard and/or a device handler or prober, to handle the components 106and provide an interface between the components 106 and the tester 102.The test system 100 may include or be connected to other components,equipment, software, and the like to facilitate testing of thecomponents 106 according to the particular configuration, application,environment of the test system 100, or other relevant factors. Forexample, in the present embodiment, the test system 100 is connected toan appropriate communication medium, such as a local area network,intranet, or global network like the internet, to transmit informationto other systems, such as the remote server 116.

The test system 100 may include one or more testers 102 and one or morecomputers 108. Further, the computer 108 may be separate from the tester102, or may be integrated into the tester 102, for example utilizing oneor more processors, memories, clock circuits, and the like of the tester102 itself. In addition, various functions may be performed by differentcomputers.

A test system 100 according to various aspects of the present inventiontests the components 106 and provides enhanced analysis and testresults. For example, the enhanced analysis may identify incorrect,questionable, or unusual results. The test system 100 may also analyzemultiple sets of data, such as data taken from multiple wafers and/orlots of wafers, to generate composite data based on multiple datasets.Further the test data may include data from multiple sources, such asprocess control or electrical test (ET) data relating to the electricalcharacteristics for various points on the wafer and/or for thecomponents 106, bin map data one or more wafers indicating the pass/failbinning classifications for the components 106, outlier data and outliersignature data, and outlier classification data, such as categorizationsof outliers as small, medium, or critical according to selectedcriteria. Various data may also be used by the test system 100 todiagnose characteristics in the fabrication, test, and/or other process,such as problems, inefficiencies, potential hazards, instabilities, orother aspects that may be identified via the test data. The operator,such as the product engineer, test engineer, manufacturing engineer,device engineer, or other personnel using the test data and analyses,may then use the results to verify and/or improve the test system 100and/or the fabrication system and classify the components 106.

The test system 100 according to various aspects of the presentinvention executes an enhanced test process for testing the components106 and collecting and analyzing test data. The test system 100 suitablyoperates in conjunction with a software application executed by thecomputer 108. The software application of the present embodimentincludes multiple elements for implementing the enhanced test process,including a configuration element, a supplementary data analysiselement, and an output element. The test system 100 may also include acomposite analysis element for analyzing data from more than onedataset. Further, the test system may include a diagnostic system foridentifying characteristics and potential problems using the test data.

Each software element suitably comprises a software module operating onthe computer 108 to perform various tasks. Generally, the configurationelement prepares test system 100 for testing and analysis. In thesupplementary data analysis element, output test data from the tester102 and/or other sources are analyzed to generate supplementary testdata, suitably at run time and automatically, in conjunction with anin-line process, or after processing. The supplementary test data isthen transmitted to the operator or another system, such as thecomposite analysis element, the diagnostic system, and/or the outputelement.

The test system 100 commences a test run, for example in conjunctionwith a conventional series of tests, in accordance with a test program.The tester 102 may perform multiple tests on each component 106 on awafer or the wafer itself, and each test may be repeated several timeson the same component 106. The tests may comprise any appropriate tests,such as (but not limited to) continuity, supply current, leakagecurrent, parametric static, parametric dynamic, and functional andstress tests. Test data from the tester 102 is stored for quick accessand supplemental analysis as the test data is acquired. The data mayalso be stored in a long-term memory for subsequent analysis and use.

As the tester 102 generates the test results, the output test data foreach component, test, and repetition is stored by the tester 102 in atester data file. The output test data received from each component 106is analyzed by the tester 102 to classify the performance of thecomponent 106, such as into a particular bin classification, for exampleby comparison to the upper and lower test limits, and the results of theclassification are also stored in the tester data file. The tester datafile may include additional information as well, such as logistics dataand test program identification data. The tester data file is thenprovided to the computer 108 in an output file, such as a standardtester data format (STDF) file, and stored in memory. The tester datafile may also be stored in the storage system for longer term storagefor later analysis, such as by the composite analysis element.

When the computer 108 receives the tester data file, the supplementarydata analysis element analyzes the data to provide enhanced outputresults. The computer 108 may provide any appropriate analysis of thetester data to achieve any suitable objective. For example, thesupplementary data analysis element 206 may implement a statisticalengine for analyzing the output test data and identifying data andcharacteristics of the data of interest at run time or later. The dataand characteristics identified may be stored, while data that is notidentified may be otherwise disposed of, such as stored or discarded.The supplementary data analysis element 206 may perform thesupplementary data analysis at run time, as an in-line process, or as anoff-line process. The present supplementary data analysis is performedas an in-line process, i.e., as an automatic function integrated intothe testing process.

The computer 108 may perform additional analysis functions upon thegenerated statistics and the output test data. Each test generates atleast one result for at least one of the components. Referring to FIG.2, an exemplary set of test results for a single test of multiplecomponents comprises a first set of test results having statisticallysimilar values and a second set of test results characterized by valuesthat stray from the first set. Each test result may be compared to anupper test limit and a lower test limit. If a particular result for acomponent exceeds either limit, the component may be classified as a“bad part” or otherwise classified according to the test and/or the testresult.

Some of the test results in the second set that stray from the first setmay exceed the control limits, while others do not. For the presentpurposes, those test results that stray from the first set but do notexceed the control limits or otherwise fail to be detected are referredto as “outliers”. Outliers are generally considered to be observationswhich appear to be inconsistent with the remainder of a set of data. Theoutliers in the test results may be identified and analyzed for anyappropriate purpose, such as to identify potentially unreliablecomponents. The outliers may also be used to identify various potentialproblems and/or improvements in the test and manufacturing processes.

Analyzing each relevant datum according to the selected algorithmsuitably identifies the global and/or hybrid outliers. If a particularalgorithm is inappropriate for a set of data, the computer 108 mayselect a different algorithm. The computer 108 may operate in anysuitable manner to designate outliers, such as by comparison topre-selected or dynamic values. For example, an outlier identificationsystem according to various aspects of the present invention initiallyautomatically calibrates its sensitivity to outliers based on selectedstatistical relationships for each relevant datum or other data. Some ofthese statistical relationships are then compared to a threshold orother reference point, such as the data mode, mean, or median, orcombinations thereof, to define relative outlier threshold limits. Inthe present embodiment, the statistical relationships are scaled, forexample by one, two, three, and six standard deviations of the data, todefine the different outlier amplitudes. The output test data may thenbe compared to the outlier threshold limits to identify and categorizethe output test data as outliers.

The computer 108 stores the resulting statistics and outliers, as wellas corresponding identifiers, such as the x-y wafer map coordinates.Selected statistics, outliers, and/or failures may also triggernotification events, such as sending an electronic message to anoperator, triggering a light tower, stopping the tester 102, ornotifying a server.

In the present embodiment, the supplementary data analysis element 206includes a scaling element 210 and an outlier classification engine 212.The scaling element 210 is configured to dynamically scale selectedcoefficients and other values, for example according to the output testdata. The outlier classification engine 212 is configured to identifyand/or categorize the various outliers in the data according to selectedalgorithms.

More particularly, the scaling element of the present embodiment usesvarious statistical relationships for dynamically scaling outliersensitivity. The scaling coefficients may be calculated by the scalingelement and used to modify selected outlier sensitivity values. Anyappropriate criteria, such as suitable statistical relationships, may beused for scaling.

The outlier classification engine 212 is suitably configured to identifyand/or categorize the outliers in the components 106, output test data,and/or analysis results according to any suitable algorithms. Inaddition, the outlier classification engine 212 may be configured toutilize multiple candidate outlier identification algorithms andidentify one or more algorithms suited for identifying outliers in theoutput test data Different tests generate different populationdistributions, such that an outlier identification algorithm that isappropriate for one test may be inappropriate for another. The outlierclassification engine 212 is suitably configured to differentiatebetween different data populations and automatically select one or moreoutlier identification algorithms based on the data population type ofthe current data. The automatic selection may select from anyappropriate set of candidate outlier identification algorithms, and mayperform the selection according to any suitable criteria and analysis.

For example, referring to FIG. 3, the outlier classification engine 212may be configured to automatically perform an outlier identificationalgorithm selection process. The outlier identification algorithmselection process may be performed to select one or more appropriatealgorithms from multiple algorithms for identifying global and/or hybridoutliers. In one embodiment, the outlier classification engine 212comprises a pre-processing engine 310 and a classification engine 312.The pre-processing engine 310 suitably generates data to facilitateselection of the relevant outlier identification algorithms. Theclassification engine 312 suitably selects one or more relevant outlieridentification algorithms and identifies the global and/or hybridoutliers accordingly.

The output test data, for example data taken from a particular test, areinitially provided to the outlier classification engine 212 to analyzethe output test data for compatibility with various candidate outlieridentification algorithms. The data may be analyzed in any suitablemanner to identify appropriate algorithms for identifying the outliersin the output test data. For example, in the present embodiment, thepre-processing engine 310 receives the output test data and prepares theavailable outlier identification algorithms, such as by retrieving themfrom an outlier identification algorithm library stored in memory. Thepre-processing engine 3010 analyzes the output test data for outliersusing several of the available algorithms. In the present embodiment,the pre-processing engine 310 analyzes the output test data using eachof the algorithms designated by the user, or another suitable selectionof algorithms, to generate pre-processing data, such as outliers asidentified by all algorithms and various descriptive statistics, such asminimum, maximum, mean, median, standard deviation, CPK, CPM, and thelike.

The algorithms may be based on industry standard (e.g., IQR,median+/−N*sigma, etc.) and/or proprietary, custom, or user-definedoutlier identification techniques. The outlier identification algorithmlibrary is suitably configurable by the user, for example to add,remove, or edit outlier identification algorithms, for example accordingto the particular products under test or the characteristics of thetests to be performed. Different algorithms may be appropriate fordifferent statistical population types, such as normal, logarithmicnormal, bimodal, clamped, or low CPK data populations. The candidateoutlier identification algorithms may comprise any suitable algorithmsfor various types and distributions of data, such as inter-quartilerange (IQR) normal distribution, 3 sigma; IQR normal distribution, 6sigma; IQR log normal, 3 sigma; IQR log normal, 6 sigma; bimodalalgorithms; clamped algorithms; low capability algorithms; customalgorithms based on 3-, 6-, or n-sigma; and proprietary algorithmshaving various sensitivities.

The pre-processing algorithm results are dynamically selected for globaland/or hybrid outlier detection. In the present embodiment, the outlierclassification engine 212 analyzes the test results generated by thepre-processing engine 310 to identify the most useful or applicableoutlier identification algorithms. The data from the selected outlieridentification algorithms may be retained, while the remaining data isdiscarded. For example, in the present embodiment, the classificationengine 312 receives the results of the pre-processing analysis generatedby each of the available outlier identification algorithms. Theclassification engine 3012 analyzes the pre-processing data according toany suitable criteria, such as predetermined and/or user-definedrecipe-driven rules to determine whether the pre-processing data satisfyvarious criteria.

The rules may be any appropriate rules, for example employingstatistical ratios or values, such as comparing statistics, likeminimum, maximum, mean, median, standard deviation, CPK, and CPM, tovarious thresholds or other criteria. For example, the classificationengine 3012 may skip the outlier detection process under certaincircumstances, such as having too few test results or a too narrow or abimodal distribution among the test results. The rules may bepre-selected and/or may be adjusted or added by the user to accommodatespecific conditions of the products and test environment. Further, theclassification engine 312 may be configured to apply a particularalgorithm to a certain type of test, for example when the results of thetest are known to have a particular distribution. Other rules maydetermine whether a particular test is applicable. For example, theclassification engine 312 may compare the CPK to a threshold. If the CPKis below the threshold, then the IQR normal outlier identificationalgorithm may be used. In the present system, results from an algorithmsatisfying a rule are used for outlier identification. Other algorithmresults for that test are suitably ignored.

The outlier classification engine 212 may also categorize selectedglobal and/or hybrid outliers and components 106 according to the testresults and the information generated by the supplementary analysiselement 206. For example, the outlier classification engine 212 may beconfigured to categorize the components 106 into critical/marginal/goodpart categories, for example in conjunction with user-defined criteria;user-defined good/bad spatial patterns recognition; classification ofpertinent data for tester data compression; test setup in-situsensitivity qualifications and analysis; tester yield leveling analysis;dynamic wafer map and/or test strip mapping for part dispositions anddynamic retest; or test program optimization analyses. The outlierclassification engine 212 may classify components 106 and associateddata in accordance with conventional SPC control rules, such as WesternElectric rules or Nelson rules, to characterize the data.

The outlier classification engine 112 suitably categorizes the datausing a selected set of classification limit calculation methods. Anyappropriate categorization methods may be used to characterize the dataaccording to the needs of the operator. The present outlierclassification engine 212, for example, categorizes outliers bycomparing the output test data to selected thresholds, such as valuescorresponding to one, two, three, and six statistically scaled standarddeviations from a threshold, such as the a test limit or a data mean,mode, and/or median. The identification of outliers in this manner tendsto normalize any identified outliers for any test regardless of datumamplitude and relative noise.

In one embodiment, the outlier thresholds may be defined asymmetricallywith respect to a selected center point in the data distribution. Usingasymmetrical thresholds may reduce the effects of non-Gaussiandistributions and/or the presence of outliers on identifying theoutliers. The outlier thresholds may be selected without determining aparticular location in the data, such as a mean, or a scale, such as astandard deviation, based on the data, which can be influenced byoutliers. For example, the outlier thresholds may be selected accordingto the distribution of the data, such as by deriving the thresholds froma center point of the frequency distribution of the test data. In oneembodiment, referring to FIG. 10, instead of basing the outlierthresholds on standard deviations from the center point, the outlierthresholds 1010, 1012 may be adjusted or selected according to thequartile values 1014, 1016 for the set of data. The quartile valuescorrespond to the median values between the median of the full data setand the edge of the distribution.

In addition, the thresholds 1010, 1012 may be adjusted according to anyrelevant criteria, such as the distribution of the data. In oneembodiment, the thresholds 1010, 1012 are adjusted according to thegeneralized slope of the data distribution in the relevant area, such asaround the first and third quartile points 1014, 1016. For example, thethresholds 1010, 1012 may be adjusted by an amount that is inverselyrelated to the general slope of the data in the relevant area.Consequently, in an area where the slope is lower, the outlierthresholds 1010, 1012 may be farther from the center point than in anarea where the slope is higher. Thus, the lower threshold and upperthreshold 1010, 1012 may be selected according to the equations:UCL=Q3+N*UQWLCL=Q1−N*LQW

where UCL and LCL are the upper and lower outlier thresholds 1010, 1012respectively, Q1 and Q3 are respectively the first and third quartilemarks, N is a constant, and UQW and LQW are weights assigned accordingto an inverse relationship to the slope.

Additional thresholds may be defined as well, for example to definesmall, medium, and large outliers. The thresholds and categories may bedefined in any manner. For example, the category thresholds may bedetermined according to scaled statistical relationships, for example byone, two, three, and six standard deviations of the data from themedian, to define the different outlier amplitudes. The output test datamay then be compared to the outlier threshold limits to identify andcategorize the output test data as outliers.

Alternatively, the categories may be defined relative to the outlierthresholds. The demarcations between the various categories may beselected according to any suitable criteria, such as according to theoutlier threshold, the test limit, and/or the edge of the distribution.For example, the range between the outlier threshold and the test limitmay be divided into two or more categories 1020, 1022, 1024. The rangebetween the outlier threshold and the test limit may be divided intoequal areas or different areas to define different magnitudes ofoutliers. In the present embodiment, each side of the distributionbetween the outlier threshold and the edge of the distribution isdivided into three or four equal categories 1020, 1022, 1024 to definethe outliers as large, medium, small, and/or tiny. The size and numberof the various categories may be identical on both sides of thedistribution, or may vary for each side of the distribution.

The outlier classification engine 112 analyzes and correlates thenormalized outliers and/or the raw data points based on user-definedrules. The outlier classification engine 112 suitably performs thecategorization according to each test, which may be performedindependently of data from other tests or in conjunction with such datafrom other tests. Any suitable criteria may be used for categorizing thecomponents based on test failures and outliers, such as:

FAIL if the part fails at least one test.

CRITICAL if at least one test has been identified to have a LARGEoutlier or at least two MEDIUM outliers on two different tests for theotherwise passing part.

MARGINAL if at least one test has been identified to have a MEDIUMoutlier or at least four SMALL outliers on four different tests for theotherwise passing part.

SMALL if at least one test has been identified to have a SMALL outlierfor the otherwise passing part.

PASS without any SMALL, MEDIUM, or LARGE outlier for the otherwisepassing part.

Criteria for small, medium, and large outliers may be selected accordingto any suitable criteria, such as thresholds based on the test limitsand/or characteristics of the data

The outlier classification engine 112 may also include a hybrid outliersystem configured to identify hybrid outliers, also referred to as localoutliers or hidden outliers, in the data. Hybrid outliers areinconsistent data points within a local group of data, but do not crossthe thresholds defining the “global” outlier boundaries of the mainpopulation. Referring to FIG. 4, a global outlier lies outside the datadistribution of the main data population. Hybrid outliers, however, donot lie outside the data distribution of the main data population, butmay deviate significantly from the norm in view of multiple parameters.Thus, the outlier classification system may analyze the test data toidentify outliers according to two or more parameters. Other parametersmay be any appropriate data, such as data for spatially relatedcomponents, data resulting from correlated tests, or other relevantdata. For example, spatial information may be used to identify deviceswhose parameters are all within the main distribution, but notconsistent with data for a more local physical area. Additionally, ahigh level of correlation between two parameters may be used to identifydevices whose parameters are individually consistent with the maindistribution, but are not consistent in their relationship to anotherparameter.

The degree of correlation between two parameters, such as results of twotests, may be assessed in any manner. For example, a correlation indexmay be calculated using linear or logarithmic regression techniques,such as to establish a best-fit straight line between two data sets. Thecloser the magnitude of the slope of this line is to unity, the higherthe degree of correlation. Determining the degree of correlation includeany relevant considerations or processes, such as removing largeoutliers or compensating for multi-site variation.

Upon determining an expected degree of correlation, the correlation oftest results may be determined to identify outliers that do not conformto the underlying relationship. Referring to FIG. 11A, for tests with ahigh degree of correlation, all results are expected to reside within acertain distance 1112 from the best-fit straight line 1110. Referring toFIG. 11B, if an outlying result 1114 is present, then the overallcorrelation index slightly affected, but the corresponding data pointsresides outside the expected distance 1112 from the best-fit straightline 1110, even if the test result remains within the main distributionfor the relevant parameter. The computer 108 may also select situationsfor performing correlation analyses, for example according to the degreeof correlation between two parameters, availability of computingresources, and demand for optimal analysis.

In one embodiment, hybrid outliers may be identified in view of the testdata for the component relative to data for spatially relatedcomponents, such as components from a smaller local population of datapoints, like a temporally or spatially local population. For example,referring to FIG. 5, the tester generates data by testing variouscomponents on the wafer sequentially. The tester moves the prober fromcomponent to component, accumulating test data for each component. Dueto variations in the manufacturing process and materials and the testingprocess, the test data may vary regularly as the prober traverses thewafer (FIG. 6). Data points 610 lying well outside the variations forthe wafer or for multiple wafers are ordinarily classified as globaloutliers. Data points 604 within or slightly beyond the variationsordinarily escape classification as global outliers, even if the datapoints 604 significantly differ from the data points for spatiallynearby or corresponding components.

Hybrid outliers are suitably identified by analyzing individual raw,normalized, or otherwise processed data points with respect to proximatedata points. The outlier classification engine 112 may apply a proximityanalysis by comparing parametric or other data for individual componentsto hybrid outlier thresholds calculated using data for spatially relatedcomponents. The proximity analysis may also be weighted, for exampleaccording to the distance between a central component and a nearbycomponent. Proximity analysis may be performed by any appropriatesystem, such as a dedicated proximity analysis engine or a proximityanalysis engine associated with another task, like a proximity analysisengine used for generating composite representations based on multipledatasets.

In one embodiment, as parametric data is processed, the outlierclassification engine 112 may calculate hybrid outlier thresholds for alocal data set, such as a selected number of most recently generateddata. The data to be analyzed may comprise any suitable data, such aspreceding data points, subsequent data points, both preceding andsubsequent data points, temporally related data points, or spatiallyrelated data points. In addition, the number of data points in the localdata set may be selected according to any suitable criteria, such asusing a pre-selected number of data points or selecting the number ofdata points according to various criteria, such as the variability ornoisiness of the data. If the data is noisy, the number of data pointsmay be automatically adjusted, for example by increasing the number ofdata points in the local data set to reduce the effects of the noise.The hybrid outlier thresholds may be dynamically re-calculated withaccumulation of new parametric data, for example by using a first-in,first-out (FIFO) calculation. The parametric data for the components inthe area may then be compared to the thresholds to identify hybridoutliers.

The outlier classification engine 112 may be configured to identify thehybrid outliers 604 according to any suitable mechanism or process. Inthe present embodiment, the hybrid outliers 604 are identified inconjunction with an in-line process, i.e., performed automatically via aprocess that receives the data as it is generated or after testing iscomplete and the data stored and automatically provides the resultswithout operator intervention or other action. Alternatively, the hybridoutliers 604 may be identified at run time or after the analysis iscompleted and the data stored. The outlier classification engine 112 maybe integrated into the test program executed by the tester 102. The testdata may comprise any type of test data, such as bin results orparametric testing data. Further, the data may be pre-processed, such asby removing global outliers and/or data for components classified asfailures, or normalizing data across various sections of the wafer, suchas sections associated with multisite testing or stepper fields.

Referring to FIG. 7, in the present embodiment, the outlierclassification engine 112 or other outlier identification systeminitially filters out global outliers from the data and processes theremaining data to identify hybrid outliers (710). The outlierclassification engine 112 may further process the filtered data, forexample to normalize the data in view of known sources of variation(712). The outlier classification engine 112 may then identify hybridoutliers according to any suitable technique or criteria. For example,the outlier classification engine 112 may select and/or adjust a windowencompassing a selected number and geometry of components to identifyspatially related components (714). The outlier classification engine112 may then determine a neighborhood value for the componentcorresponding to the test data for the components in the neighborhood(716). The test data for the component may then be analyzed relative tothe neighborhood value to identify hybrid outliers.

In addition, the data may be normalized, which comprises adjusting datavalues to correct for differences in data that may be generated due toknown causes that negatively impact the consistency of the data. In thepresent embodiment, the hybrid outlier system normalizes the test datafor each component relative to test data for all other components on thewafer to facilitate comparison. For example, if the test data wasgenerated using multi-site testing in which multiple resources testwafers or components

The data may be normalized in any suitable manner, such as to addressknown sources of variation. Components sharing a known common source ofinconsistency may be normalized using the same normalization criteria.For example, in the present embodiment, parametric test data generatedin conjunction with a multi-site testing environment may be normalizedaccording to the following equation:${norm\_ site}_{n} = \frac{{data\_ site}_{n} - {median\_ site}_{n}}{{iqr\_ site}_{n}}$where for the site n, norm_site_(n) is the resulting normalized data,data_site_(n) is the pre-normalized data, median_site_(n) is thestatistic median and iqr_site_(n) is the interquartile range for thesite analyzed. The normalized data may be stored in any suitable manner,for example on a normalized data device map having the normalized datafor each device.

To detect hybrid outliers, the outlier classification engine 112analyzes local data, such as data for spatially near components. Forexample, the outlier classification engine 112 may detect hybridoutliers according to the geometry of the wafer. The hybrid outliers maybe identified according to any suitable technique or process. In thepresent embodiment, the outlier classification engine 112 analyzes thedata for nearby components.

In one embodiment, the computer 108 may perform a proximity analysis foreach component on the wafer. For example, the computer 108 may beconfigured to identify hybrid outliers by identifying outliers withinsubsets of the overall dataset, such as within data for a selected groupof components on a particular wafer. In one embodiment, referring toFIG. 8, the computer 108 may establish a pattern or window 810 of aselected size that may be used to select multiple components on a wafer812. The pattern or window 810 comprises a perimeter defining a spatialarea of interest, such as a subset of the components on the wafer 812.The size of the window 810 may be selected or adjusted according to anysuitable criteria, such as the number of components proximate thecentral component, the type of test or component, or the desiredsensitivity of the outlier analysis. The shape of the window maylikewise be selected according to any appropriate criteria, such as thespatial area of interest. For example, if the area of interest does notfacilitate the use of a rectangular window, such as near the curvededges of the wafer 812, an alternative shape may be selected for thewindow 810 to accommodate the relevant area of the wafer 812.

Referring to FIG. 9, in one embodiment, the outlier classificationengine 112 may employ one or more predetermined windows or masks thatmay be applied to each component in the data set to establish therelevant window. The size and shape of the windows may be selectedaccording to any suitable criteria, such as according to a pre-selectedtype, an operator designation, or an automatic selection process. Theuser can pre-define the type and size of the window to be used or theoutlier classification engine 112 can automatically adapt the windowshape and/or size according to the nature of the area surrounding therelevant component, such as for locations near the edge of the wafer,possible sampling factors, empty spaces on the wafer or missing data,and the like.

In the present embodiment, the outlier classification engine 112requires a selected amount of data, such as data for at least four toeight surrounding components, such as five, to perform the analysis. Inone embodiment, the devices only count towards the minimum if theysatisfy one or more selected criteria, such as being classified aspassing or “good” devices. If the initial area within the window hasfewer than five other “good” devices, the outlier classification engine112 may adjust increase the size and/or shape of the window or patternuntil the number of components meets the required number. In oneembodiment, the outlier classification engine 112 adjusts the window orpattern by applying a predetermined sequence of patterns, in which thesequence comprises patterns of increasing size. In addition, the outlierclassification engine 112 may abort the analysis if a minimum number ofacceptable data is not available. For example, the outlierclassification engine 112 may only perform the analysis and/or adjustthe size and/or shape of the window if the device under study has atleast five “bin 1” devices in the initial window. Thus, data for aparticular device may be categorized as a hybrid outlier only if thereare sufficient data for neighboring components in the initial window. Inother embodiments, substitute data may be used. For example, the missingdata may be replaced with idealized data representing ideal values forsuch components, or with data based on other components on the wafer,such as components in surrounding bands around the missing data orcomponents in corresponding positions on other wafers in the lot.

The window 810 is then applied to various areas of the wafer 812, andhybrid outliers are identified among the data for components within thewindow 810. The outlier classification engine 112 may identify thehybrid outliers according to any appropriate criteria and technique. Forexample, the outlier classification engine 112 may initially analyze astarting component, such as a component at the left end of the top rowof the wafer, and sequentially analyze data for each component on thewafer. The outlier classification engine 112 may adjust the size andshape of the window as each component is analyzed. Data for thecomponents within the window are then retrieved from memory and analyzedfor outliers, for example using only the data for components within thewindow.

The area within the window may be characterized and/or analyzedaccording to any suitable criteria or technique. For example, theoutlier classification engine 112 may calculate one or morerepresentative neighborhood values for each component based on the datafor the surrounding components. A window is applied to each component togenerate a neighborhood value for each component for the relevant test.The data for each component may then be analyzed and/or compared to thecorresponding neighborhood value to identify hybrid outliers.

In the present embodiment, as the window is applied to each component,the outlier classification engine 112 calculates a neighborhood valuefor the window corresponding to a derived value derived from values forthe components in the neighborhood. For example, the derived value maycomprise an average value representing a central value for multiplevalues, such as an arithmetic mean, statistic median, mode, geometricmean, weighted mean, and the like. The present outlier classificationengine 112 employs the median value as the neighborhood value, as themedian may often operate as a robust estimator of the center of thelocal neighborhood. The median value is then stored, for example on aneighborhood value device map storing the neighborhood value calculatedfor each device. The outlier classification engine 112 moves the windowto the next component. In addition, the outlier classification engine112 may apply different weights to the data of surrounding components,such as according to the proximity of a device to the central device.For example, data for components closer to the central component mayhave greater affect in calculating the neighborhood value than data forcomponents that are farther away.

Hybrid outliers may be identified by analyzing the data for eachcomponent in view of the corresponding neighborhood values. Any suitableoutlier detection system or approach may be used to identify the hybridoutliers. For example, the outlier classification engine 112 maycalculate a unique hybrid outlier threshold based on the neighborhoodvalue for each component and compare the raw data to the calculatedthreshold. Alternatively, the outlier classification engine 112 maygenerate a value based on the raw data and the neighborhood value foreach component and compare the value to a threshold or the like.Locations of identified hybrid outliers are stored, along with any otherrelevant information.

In the present embodiment, referring again to FIG. 7, the outlierclassification engine 112 generates a residual value for each componentaccording to the raw data and the unique neighborhood values (718) andapplies conventional outlier identification techniques to the residualsto identify hybrid outliers (720). The residual component may bedetermined according to any suitable criteria or technique. In thepresent embodiment, residuals are calculated using the followingequation: ${residual} = \frac{{data}_{norm} - {data}_{smooth}}{RMSE}$where data_(norm) is the normalized raw test data value, data_(smooth)is the relevant neighborhood value, and RMSE defines the square root ofthe mean square error of values for all relevant points, such as for alldevices on the wafer. In the present embodiment, RMSE is the root meansquare of the sum of the differences between the normalized and smootheddata for each device on the wafer. This equation produces a standardizederror, which is approximately a normal distribution with mean 0 andvariance of 1.

The resulting residual data may be stored, for example in a residualdata device map storing residual data values for each device. Outlierdetection techniques may be applied to the residual values for thevarious components to identify outliers. For example, the outlierclassification engine 112 may calculate control limits based on theresidual values, and the residual values exceeding the limits may beidentified as hybrid outliers. For example, the outlier classificationengine 112 may analyze the residual data according to one or moreoutlier identification algorithms to identify outliers in the residualdata. The corresponding devices may then be designated as outliers forthe relevant test.

The outlier classification engine 112 may also be configured to performadditional analysis of the hybrid outlier data. For example, the outlierclassification engine 112 may identify and categorize selected hybridoutliers and components 106, such as to categorize the components 106into multiple categories as described previously regarding globaloutliers. In one embodiment, the hybrid outliers are categorized assmall, medium, or large hybrid outliers, for example in conjunction withuser-defined criteria; user-defined spatial patterns recognition;categorization of pertinent data for tester data compression; test setupin-situ sensitivity qualifications and analysis; tester yield levelinganalysis; dynamic wafer map and/or test strip mapping for partdispositions and dynamic retest; or test program optimization analyses.

The outlier classification engine 112 suitably classifies the hybridoutlier data using a selected set of classification limit calculationmethods. Any appropriate classification methods may be used tocharacterize the hybrid outlier data according to the needs of theoperator. The present outlier classification engine 112, for example,categorizes hybrid outliers by comparing the relevant test data orderived data to selected thresholds, such as values corresponding toone, two, three, and six statistically scaled standard deviations from athreshold, such as the test limits or a data mean, mode, and/or medianfor a relevant geographic area. The categorization of hybrid outliers inthis manner tends to normalize any identified hybrid outliers for anytest regardless of datum amplitude and relative noise. In oneembodiment, the outlier classification engine 112 may categorize hybridoutliers according to the magnitude of the difference between the testdata for the central component and the derived value for the pluralityof local components. The categorization may be based directly on thedifference or indirectly, such as according to a value based on orderived from the difference.

The outlier classification engine 112 analyzes and correlates thenormalized hybrid outliers and/or the raw data points based onuser-defined rules. The hybrid outlier classification engine 112suitably performs the categorization according to each test, which maybe performed independently of data from other tests or in conjunctionwith such data from other tests. Criteria for small, medium, and largeoutliers may be selected according to any suitable criteria, such asthresholds based on the test limits and/or characteristics of the data

In one embodiment, the outlier classification engine 112 may forego thehybrid outlier detection analysis or discard the results of suchanalysis if the results appear to be unlikely to yield usefulinformation. For example, the outlier classification engine 112 maycompare the number of test data points for a particular test to athreshold, such as 20. If there are fewer than the threshold number oftest data points for analysis, the outlier classification engine 112 mayforego the hybrid outlier analysis. Similarly, if the number of hybridoutliers detected exceeds a threshold, such as 10, the hybrid outlieranalysis results may be discarded as generating too many results to beconsidered hybrid outliers. Alternatively, the outlier classificationengine 112 may perform and/or retain the results, but annotate theresults to indicate the potentially suspect nature of the data.

The outlier classification engine 112 may also identify components thatare near clusters of failures and outliers. Such components in the samearea might exhibit problems similar to the outliers and failures in thearea, but the problems may not be identified upon initial analysis ofthe test data. Thus, the outlier classification engine 112 may identifya “good die in a bad neighborhood” using failure data, global (orsingle-parameter) outlier data, and/or hybrid (or multiple-parameter)outlier data.

The outlier classification engine 112 may identify relevantneighborhoods and nearby components in any appropriate manner. Forexample, referring to FIG. 12, the outlier classification engine 112 mayinitially identify the relevant neighborhoods by analyzing the failuredata 1210, global outlier data 1212, and/or hybrid outlier data 1212 toidentify groups 1214 of adjacent or otherwise spatially relatedcomponents. The neighborhoods around the groups 1214 may be definedaccording to any criteria, such as size, shape, or location of the group1214. In addition, the neighborhoods may be defined according to one ormore types of data, such as groups 1214 of failures 1210, globaloutliers 1212, or hybrid outliers 1212. In the present embodiment, theoutlier classification engine 112 identifies neighborhoods comprising aminimum number of test results for adjacent components 106 havingfailures 1210, global outliers 1212, and hybrid outliers 1212.

Upon identifying the relevant neighborhoods, the outlier classificationengine 112 identifies spatially related test results that do notcomprise failures 1210, global outliers 1212, and hybrid outliers 1212.For example, test results for components 106 adjacent at least one, two,or three components 106 in the group 1214, or within a selected distancefrom the group 1214, may be designated for further analysis. Theidentified components 106 may be designated, for example as “suspect”components 1216 or the like. The identified suspect components 1216 andtest results may be further analyzed, designated as outliers, and/orreported.

The supplementary data analysis element 206 may further include adiagnostic system including a pattern recognition system for identifyingprocess characteristics based on patterns recognized in the test datafor identifying process characteristics based on patterns recognized inthe test data. The pattern recognition system is suitably configured toreceive the data from the various sources and identify patterns in thedata. The pattern recognition system is also suitably configured tomatch the identified patterns with known issues associated with suchpatterns, for example by assigning a likelihood of a particular issuebased on the identified pattern. For example, clusters of devices havingsimilar non-passing bin results or outliers located in the same positionon different wafers may indicate a particular problem in themanufacturing process. The pattern recognition system identifies andanalyzes patterns in the data that may indicate such issues in themanufacturing and/or test process. The pattern recognition system isdescribed is further detail in U.S. patent application Ser. No.10/817,750 (Publication No. US-2004-0267477-A1), filed Apr. 2, 2004, thedisclosure of which is hereby incorporated by reference. In the presentembodiment, the pattern recognition system may identify a repetitivepattern of defects or outliers on a wafer or series of wafers, such asdue to reticle or stepper related problems. The pattern recognitionsystem may then use the identified pattern to identify potentiallydefective or outlier devices on the wafer, even though the devices mayseem unaffected.

The computer 108 collects data from the test system 100, suitably at nmtime or in conjunction with an in-line process, and provides an outputreport to a printer, database, operator interface, or other desireddestination. Any form, such as graphical, numerical, textual, printed,or electronic form, may be used to present the output report for use orsubsequent analysis. The output element 208 may provide any selectedcontent, including selected output test data from the tester 102 andresults of the supplementary data analysis.

In the present embodiment, the output element 208 suitably provides aselection of data from the output test data specified by the operator aswell as supplemental data. The computer 108 may also be configured toinclude information relating to the outliers, or other informationgenerated or identified by the supplementary data analysis element. Ifso configured, the identifiers, such as x-y coordinates, for each of theoutliers are assembled as well. The coordinates for theoperator-selected components and the outliers are merged into an outputreport which in the current embodiment is in the format of the nativetester data output format. Merging resulting data into the dynamicdatalog facilitates compression of the original data into summarystatistics and critical raw data values into a smaller native testerdata file, reducing data storage requirements without compromising dataintegrity for subsequent customer analysis.

The retrieved information is then suitably stored. The report may beprepared in any appropriate format or manner. In the present embodiment,the output report suitably includes the dynamic datalog having a wafermap indicating the selected components on the wafer and theircategorization. Further, the output element 208 may superimpose wafermap data corresponding to outliers on the wafer map of the preselectedcomponents. Additionally, the output element may include only theoutliers from the wafer map or batch as the sampled output. The outputreport may also include a series of graphical representations of thedata to highlight the occurrence of outliers and correlations in thedata.

The output report may further include recommendations and supportingdata for the recommendations. For example, if two tests appear togenerate identical sets of failures and/or outliers, the output reportmay include a suggestion that the tests are redundant and recommend thatone of the tests be omitted from the test program. The recommendationmay include a graphical representation of the data showing the identicalresults of the tests.

The output report may be provided in any suitable manner, for exampleoutput to a local workstation, sent to a server, activation of an alarm,or any other appropriate manner (step 712). In one embodiment, theoutput report may be provided off-line such that the output does notaffect the operation of the system or transfer to the main server. Inthis configuration, the computer 108 copies data files, performs theanalysis, and generates results, for example for demonstration orverification purposes.

The particular implementations shown and described are merelyillustrative of the invention and its best mode and are not intended tootherwise limit the scope of the present invention in any way. For thesake of brevity, conventional signal processing, data transmission, andother functional aspects of the systems (and components of theindividual operating components of the systems) may not be described indetail. Furthermore, the connecting lines shown in the various figuresare intended to represent exemplary functional relationships and/orphysical couplings between the various elements. Many alternative oradditional functional relationships or physical connections may bepresent in a practical system. The present invention has been describedabove with reference to a preferred embodiment. Changes andmodifications may be made, however, without departing from the scope ofthe present invention. These and other changes or modifications areintended to be included within the scope of the present invention, asexpressed in the following claims.

1. A semiconductor test data analysis system for identifying outliers insemiconductor test data, comprising: a memory configured to store thetest data; and a processor connected to the memory, wherein theprocessor is configured to: retrieve the test data from memory; comparethe test data to an upper outlier threshold and a lower outlierthreshold, wherein the upper outlier threshold and the lower outlierthreshold are derived from a median of a frequency distribution of thetest data; and determine whether the test data include outliersaccording to the comparison of the test data to the upper outlierthreshold and the lower outlier threshold.
 2. A semiconductor test dataanalysis system according to claim 1, wherein: the upper outlierthreshold is derived from a third quartile mark; and the lower outlierthreshold is derived from a first quartile mark.
 3. A semiconductor testdata analysis system according to claim 1, wherein the thresholds areasymmetric relative to the median.
 4. A semiconductor test data analysissystem according to claim 1, wherein the processor is configured toadjust the thresholds according to a slope of the frequencydistribution.
 5. A semiconductor test data analysis system according toclaim 1, wherein the processor is configured to categorize the outliersaccording to a magnitude of the outliers.
 6. A semiconductor test dataanalysis system according to claim 1, wherein the processor isconfigured to assign categories to the outliers according to a rangebetween at least one of the thresholds and an edge of the distribution.7. A semiconductor test data analysis system according to claim 1,wherein the processor is configured to determine whether the test datainclude outliers according to a correlation of data corresponding to atleast two parameters.
 8. A method of identifying outliers insemiconductor test data, comprising: establishing an upper outlierthreshold and a lower outlier threshold according to of the test data,wherein the upper outlier threshold and the lower outlier threshold arederived from a median of a frequency distribution of the test data;comparing the test data to the upper outlier threshold and the LCL; andidentifying outliers in the test data according to the comparison of thetest data to the UCL and the LCL.
 9. A method of identifying outliersaccording to claim 8, wherein: establishing the upper outlier thresholdcomprises identifying a third quartile mark; and establishing the upperoutlier threshold comprises identifying a first quartile mark.
 10. Amethod of identifying outliers according to claim 8, wherein thethresholds are asymmetric relative to the median.
 11. A method ofidentifying outliers according to claim 8, wherein establishing theupper outlier threshold and the lower outlier threshold includesadjusting the thresholds according to a slope of the frequencydistribution.
 12. A method of identifying outliers according to claim 8,further comprising categorizing the outliers according to a magnitude ofthe outliers.
 13. A method of identifying outliers according to claim 8,further comprising assigning categories to the outliers according to arange between at least one of the thresholds and an edge of thedistribution.
 14. A method of identifying outliers according to claim 8,further comprising determining whether the test data include outliersaccording to a correlation of data corresponding to at least twoparameters.
 15. A medium storing instructions for causing a computer toexecute a process, wherein the process comprises: retrieving a set ofsemiconductor test data from a memory; establishing an upper outlierthreshold and a lower outlier threshold, wherein the upper outlierthreshold and the lower outlier threshold are derived from a median of afrequency distribution of the test data; comparing the test data to theupper outlier threshold and the lower outlier threshold; and determiningwhether the test data include outliers according to the comparison ofthe test data to the upper outlier threshold and the lower outlierthreshold.
 16. A medium according to claim 15, wherein: establishing theupper outlier threshold comprises identifying a third quartile mark; andestablishing the upper outlier threshold comprises identifying a firstquartile mark.
 17. A medium according to claim 15, wherein thethresholds are asymmetric relative to the median.
 18. A medium accordingto claim 15, wherein establishing the upper outlier threshold and thelower outlier threshold includes adjusting the thresholds according to aslope of the frequency distribution.
 19. A medium according to claim 15,wherein the process further comprises categorizing the outliersaccording to a magnitude of the outliers.
 20. A medium according toclaim 15, wherein the process further comprises assigning categories tothe outliers according to a range between at least one of the thresholdsand an edge of the distribution.
 21. A medium according to claim 15,wherein the process further comprises determining whether the test datainclude outliers according to a correlation of data corresponding to atleast two parameters.